Multichannel Attention Network for Analyzing Visual Behavior in Public Speaking

نویسندگان

  • Rahul Sharma
  • Tanaya Guha
  • Gaurav Sharma
چکیده

Public speaking is an important aspect of human communication and interaction. The majority of computational work on public speaking concentrates on analyzing the spoken content, and the verbal behavior of the speakers. While the success of public speaking largely depends on the content of the talk, and the verbal behavior, non-verbal (visual) cues, such as gestures and physical appearance also play a significant role. This paper investigates the importance of visual cues by estimating their contribution towards predicting the popularity of a public lecture. For this purpose, we constructed a large database of more than 1800 TED talk videos. As a measure of popularity of the TED talks, we leverage the corresponding (online) viewers’ ratings from YouTube. Visual cues related to facial and physical appearance, facial expressions, and pose variations are extracted from the video frames using convolutional neural network (CNN) models. Thereafter, an attention-based long short-term memory (LSTM) network is proposed to predict the video popularity from the sequence of visual features. The proposed network achieves state-of-the-art prediction accuracy indicating that visual cues alone contain highly predictive information about the popularity of a talk. Furthermore, our network learns a human-like attention mechanism, which is particularly useful for interpretability, i.e. how attention varies with time, and across different visual cues by indicating their relative importance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling focus of attention for meeting indexing based on multiple cues

A user's focus of attention plays an important role in human-computer interaction applications, such as a ubiquitous computing environment and intelligent space, where the user's goal and intent have to be continuously monitored. We are interested in modeling people's focus of attention in a meeting situation. We propose to model participants' focus of attention from multiple cues. We have deve...

متن کامل

Analyzing the Effect Adding an Active Feedback Network with an Inductive behavior to a Common-Gate Topology as a Transimpedance Amplifier for Low-Power and Wide-Band Communication Applications

  Common Gate (CG) topologies are commonly used as the first stage in Transimpedance Amplifiers (TIA), due to their low input resistance. But, this structure is not solely used as a TIA and comes with other topologies such as differential amplifiers or negative resistances and capacitances. This paper deals with analyzing the effect of adding an active feedback network to a common gate topology...

متن کامل

Structured Network Public Spaces a Step Toward Integration of Urban

Network of public spaces composes of a network of interconnected land use and various elements of the city, such as synthetic and natural which shows the city as a whole. Network structure of public spaces is important because understanding this network as a structure presents us the formation of the city. This paper attempts to define the status of the network of public spaces in the city stru...

متن کامل

Effectiveness of working memory intervention in behavior inhibition and visual working memory of children with Attention Deficit/ impulsive subtype Disorder (ADHD-I)

The aim of the current research study was to determine the effectiveness of working memory intervention on behavioral inhibition and visual working memory in children with symptoms of attention-deficit/impulsivity disorder in Selseleh city. The method was a quasi-experimental pre-test-post-test design with a control group. The statistical population consisted of all boy students of 8-12 years o...

متن کامل

Transforming neutral visual speech into expressive visual speech

We present a method for transforming neutral visual speech sequences into realistic expressive visual speech sequences. By applying Independent Component Analysis (ICA) to visual features extracted from time aligned neutral and equivalent expressive sequences, a model that separates speech from expression can be learned. Analyzing the behavior of different speaking styles in terms of this model...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1707.06830  شماره 

صفحات  -

تاریخ انتشار 2017